Signal processor

ABSTRACT

A signal processor includes at least one data source (3), a plurality of input registers (11, 12, 13, 14, . . . ) whose inputs are coupled to the data source by data buses (9, 10), a plurality of multipliers (19, 20; 71, 72 . . . ) for multiplying data buffered in the input registers, and a processing arrangement spread over a plurality of data processor branches (4-0, 4-1, . . . , 4-N) for processing products (p0, p1, . . . ), generated by the multipliers by arithmetic and/or logic operations. For achieving enhanced flexibility of the signal processor and increasing the number of possible applications, multiplexers (15, 16, 17, 18; 70) are provided which are used for coupling the multipliers to a respective part of the input registers in dependence on control signals (I, II, III, IV). Such a signal processor is preferably used in mobile radio technology. Further fields of application are, for example, audio, video, medical and automotive technology, ISDN systems, and digital radio.

BACKGROUND OF THE INVENTION

The invention relates to a signal processor comprising

at least one data source,

a plurality of input registers whose inputs are coupled to the datasource by data buses,

processing means for processing data buffered in the input registers byarithmetic and/or logic operations which processing means are spreadover a plurality of parallel data processor branches.

Signal processors are specific microprocessors having a high computingspeed, whose instruction sets and architectures are attuned to specificrequirements in the range of digital signal processing and which areparticularly used for converting complex algorithms in real time. Forexample, signal processors are used in the field of mobile radioaccording to the GSM standard where they are used in mobile radioterminals or radio base stations for converting complex signalprocessing algorithms. Further fields of application are, for example,audio, video, medical and automotive technology, such as DECT systems(Digital European Cordless Telephone), ISDN systems (Integrated ServicesDigital Network) or digital radio.

From DE-A 43 44 157, a conterpart of which is U.S. Pat. No. 5,799,201 isknown a signal processor of the type defined in the opening paragraph.The signal processor described there comprises a plurality of inputregisters coupled to a data source by two data buses. Only a first partof the input registers is directly connected to the data buses. Data tobe processed are transmitted to the second part of the input registersvia the first part of the input registers. In this manner, the datatransmitted to the second part of the input registers are delayed. Thedata applied to the input registers are processed in parallel. They areapplied to multipliers whose output values (products) are furtherprocessed in parallel by means of arithmetic/logic units (ALU) andaccumulator registers.

Such signal processors are suitable for the accelerated computation ofautocorrelation and cross-correlation functions. Furthermore, fasterdigital FIR filters can be realized with such signal processors.However, other algorithms, such as, for example, algorithms fordetermining the Fast Fourier Transform (FFT) or LTP (Long-TermPrediction) algorithms in the field of speech processing cannot beaccelerated with such signal processors, or only to a limited extent.

OBJECTS AND SUMMARY OF THE INVENTION

Therefore, it is an object of the invention to modify the signalprocessor of the type defined in the opening paragraph, so that enhancedflexibility of the signal processor is achieved and the number ofpossible applications is increased.

The object is achieved in that multiplexing means are provided which areused for coupling the arithmetic and/or logic-operation processing meansof the various data processing branches to a respective part of theinput registers in dependence on control signals.

The multiplexing means couple an arbitrary number of input registers,which number can be determined by the control signals, to the arithmeticand/or logic-operation processing means spread over the parallelprocessing branches. More particularly the data buffered in the inputregisters are selectively applied to the multipliers arranged in theparallel data processing branches, while the products delivered by themultipliers are further processed by further arithmetic and/orlogic-operation processing means. The data source producing the datarepresents, for example, a memory unit; but also other forms of datasources such as, for example, registers or so-termed I/O ports may beused here. The required enhanced flexibility is established without aloss of processing power, because the allocation between input registersand the arithmetic and/or logic-operation processing means is no longerfixed, but may be predefined by control signals which, in turn, canagain be easily adapted to the respective application by means ofsoftware. Based on the invention. it is possible to further process, asrequired, data buffered in an input register in different instructioncycles and different parallel data processing branches, moreparticularly, by letting various multipliers form different productswhich have at least partly the same factors, without a renewed datatransmission by a data bus being necessary and without delay elementsbeing necessary. On the one hand, as a result of the avoided datatransmissions, there is improved use of the data bus system and, on theother hand, as a result of the parallel signal processing possible withthis signal processor, there is enhanced processing capacity.

In an embodiment of the invention the arithmetic/logic units have inputsfor receiving the products generated by the multipliers, and accumulatorregisters are provided whose inputs are coupled to outputs of thearithmetic/logic units and whose outputs are coupled to inputs of thearithmetic/logic units by feedback paths. Such an arrangement providesthat product sums (scalar products) can be computed inexpensively andfast. In many areas, the computation of product sums requires digitalsignal processing. Examples thereof are mentioned above.

The invention is furthermore embodied in that the input registers arecoupled to the data source by a first and a second data bus, in that twomultipliers are provided which can be coupled to two of the inputregisters via the multiplexing means and in that a first part of theinput registers is only coupled to the first data bus and in that asecond part of the input registers is coupled both to the first and tothe second data bus. Such an arrangement is applied, for example, whenautocorrelation functions, cross-correlation functions, FIR (FiniteImpulse Response) filtering and in LTP (Long-Tern Prediction) algorithmsin the field of speech processing are determined, while sums of productsare computed which comprise two factors each.

The invention also relates to a mobile radio terminal and a mobile radiobase station including a signal processor according to the invention,which processor is used for the digital signal processing in theseapplications, for example, for converting speech Processing algorithms,for channel coding/decoding, for the conversion of equalizer functionsand/or for processing encryption algorithms. Also signal processing in aradio apparatus for digital radio, an ISDN telephone or a DECT systemmay advantageously include a signal processor according to theinvention.

The invention also relates to a method of parallel digital signalprocessing in which data from at least one data source are transmittedin parallel to a plurality of input registers by a plurality of databuses, and in which method data buffered in the input registers areselectively transmitted in parallel by multiplexing means to arithmeticand/or logic-operation processing means in dependence on controlsignals. If data buffered in one of the data registers are used byvarious multipliers in successive instruction cycles for forming aproduct, many product sums, for example, like those for computingautocorrelation functions can be computed more efficiently. If amultiplier is used for squaring data of an input register, algorithmssuch as, for example, LTP algorithms can also be computed moreefficiently.

These and other aspects of the invention will be apparent from andelucidated with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

FIG. 1 shows the block circuit diagram of a signal processor,

FIG. 2 shows a part of the signal processor shown in FIG. 1,

FIG. 3 shows a block diagram of a digital radio telephone including asignal processor shown in FIG. 1 and FIG. 2,

FIG. 4 shows a further aspect of the embodiment shown in FIG. 2, and

FIG. 5 shows a basic structure of a signal processor according to theinvention having N+1 parallel branches.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The signal processor shown in FIG. 1 comprises a bus system by which aplurality of function blocks are coupled. The bus system 1 comprisesbuses for transmitting data, addresses and control signals. For example,the bus system 1 comprises a data bus system and a program bus which arenot represented in detail. An addressing unit 2 supplies addresses to amemory unit 3, so that associated memory contents are read out. Thememory unit 3 is generally formed by a ROM and/or a RAM. Furthermore, adata processing unit 4 is provided which is used for processing the dataread from the memory unit 3 and which data processing unit comprises aplurality of parallel data processing branches shown in FIGS. 2, 4 and5. A program memory unit 6 is connected by the program bus to the unitsconnected to the bus system 1.

Furthermore, a peripheral unit 7 is connected to the bus system 1, whichperipheral unit comprises input and output units. A control unit 8 iscoupled via control lines to the units connected to the bus system 1.The control unit 8 controls the program run and coordinates the accessof the units 2 to 7 to the bus system 1.

FIG. 2 shows in more detail part of the signal processor shown inFIG. 1. The memory unit 3 operating as a data source and a data sink isaddressed by the addressing unit 2 with addresses generated by twoaddressing blocks 2a and 2b. The data addressed by the addressing block2a and stored in the memory unit 3 are transmitted by a first data bus 9(y data bus). The data addressed by the second addressing block 2b andstored in the memory unit 3 are transmitted by a second data bus 10 (xdata bus). In accordance with its function, the addressing unit 2 isconnected to the memory unit 3 by address lines. The data buses 9 and 10form part of the bus system 1 shown in FIG. 1. Instead of the memoryunit 3, also other registers of the signal processor or also I/Oregisters (parts of external interfaces of the signal processor) areconsidered possible further data sources.

Furthermore, the basic structure of two parallel data processingbranches 4-0 and 4-1 of the data processing unit (see FIG. 1) isrepresented. The data processing branch 4-0 includes two input registers11 (x0) and 12 (y0). The input register 11 is coupled to the data bus10, the input register 12 to the data buses 9 and 10. Accordingly, thesecond data processing branch 4-1 includes two input registers 13 (x1)and 14 (y1), the input register 13 being coupled to the data bus 10 andthe input register 14 to the data buses 9 and 10. The input registers11, 12, 13 and 14 are used, on the one hand, for receiving datatransmitted by the data buses 9 and 10. On the other hand, in the eventof so-termed interrupts, data buffered in the input registers aretransmitted from there by the data buses 9 and 10 to the memory unit 3,stored there in a stack and retransmitted from there to the inputregisters at a later instant, to be buffered again.

Each processing branch includes two multiplexers. The data processingbranch 4-0 includes the multiplexers 15 and 16, the data processingbranch 4-1 includes the multiplexers 17 and 18. Each one of the fourmultiplexers 15 to 18 is coupled to the output of the four inputregisters 11 to 14. Depending on the control signals I, II, III and IV,the multiplexers 15 to 18 switch one of their four input signals ie therespective data buffered in the input registers 11 to 14 through to themultipliers 19 and 20. Depending on the four switch states to becontrolled of the multiplexers 15 to 18, the control signals are 2-bitssignals which are preferably transmitted by an appropriate number ofparallel control lines (in this case 2), as in the previous embodiment.The control inputs with the control signals I to IV are then provided bythe control unit 8 in dependence on a program stored in the programmemory unit 6 (see FIG. 1).

The multiplier 19 of the first data processing branch 4-0 is used formultiplying the data supplied thereto by the multiplexers 15 and 16. Thedata switched by the multiplexer 15 form the first factor, the dataswitched by the multiplexer 16 form the second factor of the products tobe produced by the multiplier 19. These products (p0) are buffered in aregister 21 coupled to the output of the multiplier 19. This register isonly used for safeguarding an error-free pipeline processing. Insufficiently fast signal processors, in which multiplication andaddition of the same data set can be effected in one cycle, such aregister is indispensable. The products p0 produced by the multiplier 19are applied to a first input of an arithmetic/logic unit 22 (ALU) afterbeing buffered in the register 21. The second input of thearithmetic/logic unit 22 is supplied with data (a0) buffered in anaccumulator 23. The accumulator is connected to the output of thearithmetic/logic unit 22 and, in consecutive instruction cycles, isoverwritten with the data formed by this unit. The accumulator 23 iscoupled to the data buses 9 and 10, so that data buffered in accumulator23 can be transmitted to the memory unit 3.

The second data processing branch 4-1 includes a register 24 coupled tothe output of the multiplier 20, an arithmetic/logic unit 25 and anaccumulator 26, which are arranged and connected in accordance with thedescription of the first data processing unit 4-0. Also the data (a1)buffered in the accumulator 26 can be transmitted to the memory unit 3either by the data bus 9 or the data bus 10.

Additional embodiment options in the data processing branch 4-0 aredenoted by dashed lines 27 and 28. The first dashed line 27 denotes thepossibility of supplying the first input of the arithmetic/logic unit 22with data from an input register 1 to 14, instead of products formed bythe multiplier 19. The second dashed line 28 denotes that the secondinput of the arithmetic/logic unit 22 can also be supplied directly withthe data of one of the input registers 11 to 14 instead of the databuffered in the accumulator 23 and fed back to this second input. Thisembodiment option obviously also holds for the second data processingbranch 4-1. The arithmetic/logic units 22 and 25 have the function of anadder in the preferred embodiment. Further functions, such assubtraction and other arithmetic and/or logic operations may, however,also be realized.

By means of the following Table it becomes clear which products p0 andp1 can be generated by the multipliers 19 and 20 from the data x0, y0,x1 and y1 buffered in the input registers 11, 12, 13 and 14.

                  TABLE 1                                                         ______________________________________                                        Multiplier 19 Multiplier 20                                                   ______________________________________                                        p0=x0·y0                                                                           p1=x0·y0                                               p0=x1·y0                                                                           p1=x1·y0                                               p0=x1·y1                                                                           p1=x1·y1                                               p0=x0·y1                                                                           p1=x0·y1                                               p0=x0·x0                                                                           p1=x0·x0                                               p0=x1·x1                                                                           p1=x1·x1                                               p0=x1·x0                                                                           p1=x1·x0                                               p0=y0·y0                                                                           P1=y0·y0                                               p0=y1·y1                                                                           p1=y1·y1                                               p0=y1·y0                                                                           p1=y1·y0                                               ______________________________________                                    

For FIR (Finite Impulse Response) filtering and for computingautocorrelation and cross-correlation functions, terms of the followingtype are to be computed: ##EQU1## With the digital signal processorstructure described above it is possible, as described in the following,to compute two neighboring values c(i) in parallel which leads to anincreased computing speed of the signal processor. For example, thevalue c(0) is formed by adding together the products in the accumulator23 and the value c(1) by adding top-ether the products in theaccumulator 26. The computation of c(0) and c(1) is thus carried out inparallel. Once c(0) and c(1) have been computed, a computation of thevalues c(2) and c(3) may be carried out in parallel in the next step.Further values c(i) are computed in like fashion until all M values c(i)have been determined.

Said factors a(j) and b(j+i) are stored in the memory unit 3 andtransmitted by the data buses 9 and 10 to the input registers 11, 12, 13and 14 to determine the values c(i). Depending on their destination, thecomputed values c(i) are transmitted through the two data processingbranches 4-0 and 4-1 by the data buses 9 and/or 10 from the accumulators23 and 26 to the memory unit 3, and stored there. From then on they areavailable for a further signal processing.

The processes in the represented signal processor structure for thecomputation of the values c(i) will be explained in the following withreference to a program section for computing the values c(0) and c(1).The programming language "C" was used. In the present example, N isassumed to be an integer. The input register 14 (with data y1) shown inFIG. 2 may be omitted for the application represented here. Thearrangement described in FIG. 2 is thus modified in the way that onlythe registers 11, 12 and 13 are provided as input registers.

    __________________________________________________________________________    /****************************/                                                /* Computation of c[0] and c[1]*/                                             /****************************/                                                __________________________________________________________________________    /*                                                                              Initialization */                                                             py0=&a[0];                                                                    px0=&b[0];                                                                    a0=0;    /* c[0] is accululated in a0 */                                      a1=0;    /* c[1] is accululated in a1 */                                    /*                                                                              Filling of the pipeline, instruction cycles: 1, 2 and 3 */                                         x0=*px0++;                                                                    x1=*px0++,                                                                           y0=*py0++;                                                 p0=x0·y0,                                                                  p1=x1·y0,                                                                  x0=*px0++,                                                                           y0=*py0++;                                      /*                                                                              Multiplication and accumulation step, instruction cycles: 4, 5, ...,          N+3 */                                                                      do N/2 {                                                                      a0+=p0,                                                                             a1+=p1,                                                                            p0=x1·y0,                                                                  p1=x0·y0,                                                                  x1=px0++,                                                                            y0=*py0++,                                      a0+=p0,                                                                             a1+=p1,                                                                            p0=x1·y0,                                                                  p1=x0·y0,                                                                  x0=px0++,                                                                            y0=*py0++,                                      /*                                                                              Storing of the results */                                                     c[0]=a0;                                                                      c[1]=a1;                                                                    __________________________________________________________________________

The four initialization indications shown here imply first of all thatthe pointers py0 are set to the address of the first factor a(0) of thefactors a(i). Accordingly, a pointer px0 is set to the address of thefirst factor b(0) of the factors b(i). The factors a(i) and b(i) arestored in the memory unit 3. In the third and fourth initializationsteps, the two accumulators 23 and 26 are set to the values a0=0 anda1=0, respectively.

In the next three steps, there is a filling of the pipeline, in whichthe registers 11, 12, 13, 21 and 24 are loaded with first data from thememory unit 3 for a computation of c(0) and c(1). In the firstinstruction cycle, the factor b(0) is loaded into the register 11 by thedata bus 10. In the second instruction cycle, the factor b(1) is loadedinto the register 13 by the data bus 10 and the factor a(0) into theregister 12 by the data bus 9. The next, third instruction cycle impliesthat the product p0 (product of the factors b(0) and a(0) buffered inthe registers 11 and 12) produced by the multiplier 19 is loaded intothe register 21 and that the product p1 (product of the factors b(1) anda(0)) produced by the multiplier 20 is loaded into the register 24. Inthe same instruction cycle, the factor b(2) from the memory unit 3 isloaded into the register 11 by the data bus 10 and the factor a(1) intothe register 12 by the data bus 9.

The next instruction cycles, 4, 5, . . . , N+3, comprise a program loopwhich includes two instruction cycles which are executed N/2 times. Inthe first instruction cycle of the program loop, the contents of theregister 21 (product p0) are added to the contents of the accumulator23. The contents of the register 24 (product p1) are added to thecontents of the accumulator 26. Subsequently, the two registers 21 and24 are overwritten with the products from the registers 13 (x1) and 12(y0), or the registers 11 (x0) and 12 (y0), respectively. Subsequently,a new factor b(i) is loaded into the register 13 by the data bus 10 andso is a new factor a(i) into the register 12.

The second instruction cycle of the program loop comprises first, inaccordance with the first instruction cycle of the program loop, anupdating of the contents of the accumulators 23 and 26 in that theproducts p0 and p1 are added together. Then, new products p0 and p1 ofwhich the product factors are formed from the memory contents of theregisters 11 and 12 for the product p0 and from the registers 13 and 12for the product p1, are loaded into the registers 21 and 24.Subsequently, the next factor b(i) is loaded into the register II (x0)and the next factor a(i) is loaded into the register 12 by the databuses 10 and 9, respectively.

Once the program loop has been run through N/2 times, the memorycontents a0 of the accumulator 23 correspond to the value c(0) and thememory contents of the accumulator 26 (a1) correspond to the value c(1).These values are stored in the memory unit 3 and may be loaded fromthere for further signal processing purposes. The further values c(i)are computed in accordance with the method described above. Thefollowing, Table 2 is used for explaining the runs during theinstruction cycles to determine values c(i) which represent product sumsor scalar products, respectively.

                                      TABLE 2                                     __________________________________________________________________________    Instr                                                                         cycle                                                                            x0   x1   y0   p0     p1     a0    a1                                      __________________________________________________________________________    1  b(0) ?    ?    ?      ?      0     0                                       2  b(0) b(1) a(0) ?      ?      0     0                                       3  b(2) b(1) a(1) b(0)•a(0)                                                                      b(1)•a(0)                                                                      0     0                                       4  b(2) b(3) a(2) b(1)•a(1)                                                                      b(2)•a(2)                                                                      b(0)•a(0)                                                                     b(1)•a(0)                         5  b(4) b(3) a(3) b(2)•a(2)                                                                      b(3)•a(2)                                                                       ##STR1##                                                                            ##STR2##                               6  b(4) b(5) a(4) b(3)•a(3)                                                                      b(4)•a(3)                                                                       ##STR3##                                                                            ##STR4##                               .  .    .    .    .      .      .     .                                       .  .    .    .    .      .      .     .                                       .  .    .    .    .      .      .     .                                       N + 2                                                                            b(N) b(N + 1)                                                                           a(N) b(N - 1)•a(N1)                                                                 b(N)•a(N - 1)                                                                   ##STR5##                                                                            ##STR6##                               N + 3                                                                            b(N + 2)                                                                           b(N + 1)                                                                           a(N + 1)                                                                           b(N)•a(N)                                                                      b(N + 1)•a(N)                                                                   ##STR7##                                                                            ##STR8##                               __________________________________________________________________________

The contents of the registers 11 (x0), 13 (x1), 12 (y0), 21 (p0), 24(p1) of the accumulators 23 (a0) and 26 (a1) in the individualinstruction cycles become rent from Table 2.

With the aid of the program section explained above and Table 2 itbecomes clear that the factors (x0) and (x1) buffered in the inputregisters 11 and 13 are used in two successive instruction cycles forcomputing different products. For example, the factors buffered in thesetwo input registers are used in successive instruction cycles both bythe multiplier 19 to form the product (p0) and by the multiplier 20 toform the product (p1). It is not necessary to load new factorssimultaneously into the input registers 11 and 13 in each instructioncycle, but it is sufficient to load new factors alternately into thesetwo input registers by the data bus 10.

Another embodiment of the invention is a so-termed LTP (Long-TermPrediction) algorithm for a pitch lag search for speech codingalgorithms. The pitch lag search is used for determining a speaker'sinstantaneous basic oscillation period. The pitch lag search is used,for example, in speech coding in GSM mobile radio systems.

In line with the LTP algorithm, autocorrelation functions and energyvalues having the form ##EQU2## can be computed. The formula fordetermining values c(L) relates to the computation of autocorrelationfunctions, the formula for determining values g(L) relates todetermining energy values. The optimum value for the parameter L (pitchlag) appears when the maximum of the expression c² (L)/g(L) isdetermined for various values L in the range between L_(min) andL_(max). This post-processing, however, is unessential to the principleof the invention.

For the same value of the parameter L. both c(L) and g(L) can besimultaneously computed by the signal processor further explained withreference to FIG. 2. This will become evident from the program sectionto be described hereafter. Again the programming language "C" was usedfor the representation. The input registers 13 and 14 are not used.

    __________________________________________________________________________    /****************************/                                                /* Computation of c[L] and g[L]*/                                             /****************************/                                                __________________________________________________________________________    /*                                                                              Initialization */                                                             py0=&y[0];                                                                    px0=&y[-L];                                                                   a0=0;    /* g[L] is accumulated in a0 */                                      a1=0;    /* c[L] is accumulated in a1 */                                    /*                                                                              Filling of the pipeline, instruction cycles: 1 and 2 */                                            x0=*px0++;                                                                           y0=*py0++;                                                 p0=x0·x0,                                                                  p1=x0·y0,                                                                  x0=*px0++,                                                                           y0=*py0++;                                      /*                                                                              Multiplication and accumulation step, instructin cycles: 3, 4, ..., N+2       */                                                                          do N {                                                                        a0+=p0,                                                                             a1+=p1,                                                                            p0=x0·x0,                                                                  p1=x0·y0                                                                   x0=*px0++,                                                                           y0=*py0++;                                      /*                                                                              Storing of the results */                                                     c[L]=a0:                                                                      c[L]=a1:                                                                    __________________________________________________________________________

The four initialization steps discussed here include, on the one hand,the setting of two pointers py0 and px0 to the addresses of the elementsy(0) and y(-L) of a field, which contains the values y(i). Furthermore,the accumulators 23 and 26 are set to the values a0=0 and a1=0.

Subsequently, the pipeline is filled in two instruction cycles ie theregisters 11, 12. 21 and 24 are loaded with first values (x0, y0, p0,p1). Values x(-L), y(-L), y(2-L), . . . from the storage unit 3 areloaded into the register 11 by the data bus 10. Values y(0), y(1), y(2),from the memory unit 3 are loaded into the input register 12 by the databus 9. Products formed by the multipliers 19 and 20 are loaded into theregisters 21 and 24, while the multiplier 19 forms the square of thevalue (x0) stored in the register 11. The multiplier 20 forms theproduct of values (x0, y0) buffered in the input registers 11 and 12.The filling of the pipeline is effected in the program cycles 1 and 2.

The program cycles 3, 4, . . . , N+2 correspond to N times runningthrough a program loop. Each time a program loop is run through, itcorresponds to an instruction cycle in which first the product p0buffered in register 21 is added to the contents a0 of the accumulator23. At the same time, the contents al of the accumulator 26 are updatedin that the product p1 buffered in the register 24 is added up.Subsequently, new products p0 and p1 formed by the multipliers 19 and 20are loaded into the registers 21 and 24, while the product p0corresponds to the square of the contents x0 of the register 11 and theproduct p1 corresponds to the product of the values x0 and y0 bufferedin the input registers 11 and 12. Subsequently, new values y(n-L), y(n)respectively, are loaded into the input registers 11 and 12.

Once the program loop has been run through N times, the values a0 and alpresent in the accumulators 23 and 26 are transmitted via the data buses9 and 10 to the memory unit 3 as result values c(L) and g(L), and storedthere. The following Table 3 is intended to explain this example. Therows of this Table correspond to the respective instruction cycles towhich are assigned the memory contents x0, y0, p0, p1, a0 and a1 of theregisters 11, 12. 21 and 24 and of the accumulators 23 and 26.

                                      TABLE 3                                     __________________________________________________________________________    Instr.                                                                        cycle                                                                            x0    y0   p0      p1     a0       a1                                      __________________________________________________________________________    1  y(-L) y(0) ?       ?      0        0                                       2  y(1 - L)                                                                            y(1) y(-L)•y(-L)                                                                     y(-L)•y(0)                                                                     0        0                                       3  y(2 - L)                                                                            y(2) y(1 - L)•y(1 - L)                                                               y(1 - L)•y(1)                                                                  y(-L)•y(-L)                                                                      y(-L)•y(0)                        4  y(3 - L)                                                                            y(3) y(2 - L)•y(2 - L)                                                               y(2 - L)•y(2)                                                                   ##STR9##                                                                               ##STR10##                              5  y(4 - L)                                                                            y(4) y(3 - L)•y(3 - L)                                                               y(3 - L)•y(3)                                                                   ##STR11##                                                                              ##STR12##                              .  .     .    .       .      .        .                                       .  .     .    .       .      .        .                                       .  .     .    .       .      .        .                                       N + 2                                                                            y(N + 1 - L)                                                                        y(N + 1)                                                                           y(N - L•a(N - L)                                                                y(N - L)•a(N)                                                                   ##STR13##                                                                              ##STR14##                              __________________________________________________________________________

It will be recognized that the contents x0 of the input register 11 areused both for the formation of the product by the multiplier 19 (p0) andfor the formation of the product by the multiplier 20 (p1). Thearrangement described in the state of the art mentioned in the openingparagraph is unsuitable for computing values c(L) and g(L) in parallel.The signal processor according to the invention thus makes it possibleto provide a more flexible parallel signal processing.

A further advantageous application of the signal processor according tothe invention lies in the computation of sums for which squares ofvalues are added together. This corresponds to determining the scalarproduct of two identical vectors. More particularly, the computation ofenergy values of signals is carried out with sums like these. Acomputation of energy is effected according to the formula ##EQU3##

If N represents an even number, the computation of the energy e may beeffected by computing two partial sums e₀ and e₁ according to theformula ##EQU4##

The two partial sums can be determined simultaneously by processing thesignals in parallel in the two data processing branches 4-0 and 4-1 asshown in FIG. 2 in N/2+2 instruction cycles. The program sectionexplained hereafter (programming language "C") explains the computationof energy e via a parallel computation of two partial sums e₀ and e₁.

    __________________________________________________________________________    /****************************/                                                /* Computation of e0 and e1 */                                                /****************************/                                                __________________________________________________________________________    /*                                                                              Initialization */                                                             px0=&x[0];                                                                    py0=&x[N/2];                                                                  a0=0;    /* e0 is accumulated in a0 */                                        a1=1;    /* e1 is accumulated in a1 */                                      /*                                                                              Filling of the pipeline, instruction cycles: 1 and 2 */                                            x0=*px0++;                                                                           y0p32 *py0++;                                              p0=x0·x0,                                                                  p1=y0·y0,                                                                  x0=*px0++,                                                                           y0=*py0++;                                      /*                                                                              Multiplication and accumulation step, instruction cycles: 3, 4, ...,          N/2+2 */                                                                    d N {                                                                         a0+=p0,                                                                             a1+=p1,                                                                            p0=x0·x0,                                                                  p1=y0·y0,                                                                  x0=*px0++,                                                                           u0=*py0++;                                        /*                                                                              Addition of the partial sums e.sub.1 and e.sub.2, instruction cycles          N/2+3*/                                                                   a0+=a1     /*e=e1+e2*/                                                        /*                                                                              Storing of the results */                                                     e=a0;                                                                       __________________________________________________________________________

Only the input registers 11 and 12 are used for the computation. Theinput registers 13 and 14 are not used. In this example, the multiplier19 forms products p0 which correspond to the square of the respectivecontents x0 of the input register 11. The multiplier 20 accordinglyforms squares from the values y0 buffered in the input register 12.

In this example, the memory contents of the accumulators 23 and 26 arefinally added together to determine the respective energy from the twopartial sums e₁ and e₂. For carrying out this operation, the signalprocessor structure of FIG. 2 can be modified to correspond to that ofFIG. 4.

The block diagram shown in FIG. 3 of a digital radio telephone comprisesa send and a receive path. The speech signals received from a microphone30 are converted into binary coded data words in an analog/digitalconverter 31. These data words are applied to a signal processor 32. Forthe various functions performed by the signal processor 32, the blocks33 to 39 are shown in the signal processor 32 in FIG. 3. With the datawords generated by the analog/digital converter 31, a speech coding iscarried out in block 33 after which, in block 34, a channel coding and,subsequently, an encryption is carried out in block 35. These encrypteddata words are GMSK modulated in a modulator 40. This modulator 40 isconnected to an output of the signal processor 32. Subsequently, themodulated digital signals are converted into analog modulated signals ina digital/analog converter 41. These modulated analog signals areapplied to a transmission circuit 42 which generates radio signalstransmitted by an antenna 43. The path described thus far represents thesend path of the digital radio telephone.

The receive path of the digital radio telephone will be described in thefollowing. Analog radio signals received from an antenna 44 areprocessed in a receive circuit 45 and analog modulated signals areapplied to an analog/digital converter 46. The digitally modulatedsignals produced by the analog/digital converter are demodulated in ademodulator 47 and applied to the signal processor 32. The block 39 inthe signal processor 32 is to show the subsequent equalization of themodulated signals. Then a decryption function is performed, symbolizedby block 38. After a channel decoding in block 37 and a speech decodingin block 36, the signal processor 32 applies digital data words to adigital/analog converter 48 which passes the analog speech signals on toa loudspeaker 49.

The signal processor 32 is not only applicable as a radio telephone in amobile station of a mobile radio system, but also in a base station ofsuch a system. The signal processor structure explained in FIG. 2 isfurther not restricted to signal processors. For example, the structuremay also be realized in microcomputers or chips respectively,specifically developed for radio equipment of a mobile radio system(mobile and base stations). The processor structure according to theinvention may further also be used, for example, in DECT systems, ISDNtelephones or radio equipment for digital radio.

In the signal processor structure shown in FIG. 4 are providedmultiplexers 50, 51, 52 and 53. In dependence on control signalsproduced by the control unit 8 (see FIG. 1) and supplied to thesemultiplexers through control inputs 54, 55, 56 and 57, thesemultiplexers either couple one of the outputs of the arithmetic/logicunits 22 and 25, or the data bus 9 or to the accumulators 23 and 24 andtwo further accumulators 58 and 59. The output of the multiplexer 50 isthen coupled to the accumulator 23, the output of the multiplexer 51 tothe accumulator 24, the output of the multiplexer 52 to the accumulator58 and the output of the multiplexer 53 to the accumulator 59. Independence on control signals applied to the multiplexer 60 through acontrol input 61 by the control unit 8, one of the outputs of theaccumulators 23, 24, 58 and 59 is fed back to one of the inputs of thearithmetic/logic unit 22 similarly to the feedback in FIG. 2. Independence on control signals transmitted by the control unit 8 to amultiplexer 62 through a control input 63, one of the outputs of theaccumulators 23, 24, 58, 59 is coupled to an input of thearithmetic/logic unit 25 via the multiplexer 62, similarly to theprocedure with multiplexer 60. The feedback to inputs of thearithmetic/logic units 22 and 25 shown in FIG. 2 is thus increased insuch a way that an accumulator is not fixedly assigned to anarithmetic/logic unit, but that the assignment between arithmetic/logicunits and accumulators is variable.

Additionally, the data buses 9 and 10 may also be coupled to theaccumulators 23. 24, 58 and 59 via the multiplexers 50, 51, 52 and 53.When the accumulators form the sums, also summands can be directlyfetched from the memory unit 3 by means of said multiplexers andtransferred to the accumulators 23, 24, 58 and 59 and taken into accountwhen a sum is formed. Values stored in the accumulators are applied tothe data buses 10 and 9 via multiplexers 64 and 65 to be transmitted tothe memory unit 3 and be stored there.

The multiplexers 64 also have control inputs referenced 66 and 67through which the respective control signals from the control unit 8 areapplied to these two multiplexers. In dependence on these controlsignals, the data buses 10 and 9 are coupled to respective accumulatoroutputs.

The control signals applied to the multiplexers 50, 51, 52, 53, 60, 62,64 and 65 are 2-bits signals which make it possible to control themultiplexers in one of their possible four switch states.

With such an arrangement it is possible to add the two partial sums e₀and e₁ together in the computation of the energy described above, byfeeding back the output of the accumulator 24 via the multiplexer 60 tothe respective input of the logarithmic/logic unit 22, and thus byadding the two partial sums to the memory contents a0 of the accumulator23. With the described arrangement shown in FIG. 4, however, many othercombinations of contents of the accumulators 23, 24, 58 and 59 may beprovided, while data stored in the memory unit 3 can also be taken intoaccount.

FIG. 5 shows a generalization of the invention to N+1 parallel dataprocessing branches 4-0, 4-1, . . . , 4-N. The first two data processingbranches are the data processing branches 4-0 and 4-1 of FIG. 2 and FIG.4, respectively. Further N-1 data processing branches are connected inparallel. The input registers of the data processing branches are allcoupled to both data bus 9 and data bus 10. However, it is also possibleto apply a limitation, in that a part of the input registers is coupledto only one of the data buses. The multiplexers 15, 16, 17 and 18 of thefirst two data processing branches 4-0 and 4-1, and the respectivemultiplexers of the further N-1 data processing branches, are combinedto a function block 70. A function block 71 comprises the multiplier 19,the register 21 and the arithmetic/logic unit 22 of the first dataprocessing branch 4-0. Accordingly, a function block 72 comprises themultiplier 20, the register 24 and the arithmetic/logic unit 25 of thesecond data processing branch 4-1. The respective multipliers, registersand arithmetic/logic units of the further data processing branches 4-2,4-3, . . . , 4-N are combined accordingly. In dependence on controlsignals, the input registers may optionally be coupled to one or moreinputs of said multipliers via the combined multiplexing means infunction block 70. A function block 73 combines the multiplexers 50, 51,52, 53 and 60 and 62 of the first two data processing branches 4-0 and4-1 shown in FIG. 4 and respective multiplexers of further dataprocessing branches and feedback paths from the accumulators of the dataprocessing branches to inputs of the arithmetic/logic units of the dataprocessing branches. In dependence on control signals produced by thecontrol unit 8, it is thus possible to couple the outputs of thearithmetic/logic units or also the data buses 9 and 10 optionally to oneor more inputs of the accumulators (a0, a1, . . . , aN). In dependenceon control signals, it is also possible to establish feedback paths fromthe accumulators to the arithmetic/logic units. The signal processorstructure shown in FIG. 5 may, however, be arranged so that thearithmetic/logic units and the accumulators of the individual dataprocessing branches follow the signal processor structure shown in FIG.2.

The invention may obviously also be extended to more than two databuses. However, this is not shown for clarity. More particularly in thefield of video signal processing, for example, when two-dimensional fastFourier transforms are to be carried out, an extension to more than twodata buses will be useful.

Another example for explaining the generalization as shown in FIG. 5will be explained below, in which values c(i) can be computed accordingto the formula ##EQU5## These values c(i) may be output values of a FIRfilter, of an autocorrelation or cross-correlation. In this example, twodata buses 9 and 10 are used as in previous examples. Furthermore, threemultipliers are necessary, three input registers (x0, x1, x2) coupled tothe data bus 10, one input register (y0) coupled to the data bus 9 andthree accumulator registers (a0, a1, a2) and thus three data processingbranches 4-0, 4-1 and 4-2. With such a configuration, three terms forthe values c(i) can be computed simultaneously in N+4 instructioncycles. The computation will now be explained with reference to afurther program section (programnming language "C").

    __________________________________________________________________________    /*********************************/                                           /* Computation of c[0], c[1] and c[1]*/                                       /*********************************/                                           __________________________________________________________________________    /*                                                                              Initialization */                                                             py0=&a[0];                                                                    px0=&b[0];                                                                    a0=0;     /* c[0] is accumulated in a0 */                                     a1=0;     /* c[1] is accumulated in a1 */                                     a2=0;     /* c[2] is accumulated in a2 */                                   /*                                                                              Filling of the pipeline, instruction cycles 1, 2, 3 and 4 */                                           x0=*px0++;                                                                    x1=*px0++,                                                                    x2=*px0++,                                                                          y0=*py0++;                                               p0=x0·y0,                                                                 p1=x1·y0,                                                                 p2=x2·y0,                                                                 x0=*px0++,                                                                          y0=*py0++;                                   /*Multiplication and accumulation step, instruction cycles: 5, 4, ...,        N+4 */                                                                        do N/3 {                                                                      a0+=p0,                                                                           a1+=p1,                                                                           a2+=p2,                                                                           p0=x1·y0,                                                                 p1=x2·y0,                                                                 p2=x0·y0,                                                                 x1=*px0++,                                                                          y0=*py0++;                                   a0+=p0,                                                                           a1+=p1,                                                                           a2+=p2,                                                                           p0=x2·y0,                                                                 p1=x0·y0,                                                                 p2=x1·y0,                                                                 x2=*px0++,                                                                          y0=*py0++;                                   a0+=p0,                                                                           a1+=p1,                                                                           a2+=p2,                                                                           p0=x0·y0,                                                                 p1=x1·y0,                                                                 p2=x2·y0,                                                                 x0=*px0++,                                                                          y0=*py0++;                                   /*Storing of the results */                                                   c[0]=a0;                                                                      c[1]=a1;                                                                      c[2]=a2;                                                                      __________________________________________________________________________

The products formed by the three multipliers are referenced p0, p1, andp2, respectively. To a person of ordinary skill in the art, the routinesaccording to this program section shown here will be apparent from thiscontext and will not be further explained.

In lieu of the multipliers mentioned in the embodiments, to whichmultipliers data buffered in the input registers are appliedselectively, it is also possible to use adders and/or subtracters.Furthermore, variations with respect to the function of thearithmetic/logic units (ALU) are possible. It is particularlyadvantageous to use the arithmetic/logic units for forming the products.With the aid of such arrangements it is then possible to rapidlydetermine coefficients having the form ##EQU6## (for example, forcomputing the distance in the domain of speech processing), orcoefficients having the form ##EQU7##

What is claimed is:
 1. A signal processor comprising: at least one datasource,at least four input registers having inputs coupled to the datasource by a first data bus and a second data bus, wherein a first set ofthe at least four input registers is coupled only to the first data busand a second set of the at least four input registers is coupled both tothe first second data bus and to the second data bus, processing meansfor processing data buffered in the at least four input registers byarithmetic and/or logic operations, said processing means being spreadover a plurality of parallel data processing branches, and multiplexingmeans for selectively coupling the plurality of parallel data processingbranches to respective outputs of the at least four input registers independence on control signals.
 2. The signal processor as claimed inclaim 1, wherein the processing means comprises a plurality ofmultipliers spread over the plurality of parallel data processingbranches for multiplying data buffered in the at least three inputregisters.
 3. The signal processor as claimed in claim 2, wherein theprocessing means further comprises arithmetic and/or logic units whichhave inputs for receiving products generated by the plurality ofmultipliers, and accumulator registers having inputs coupled to outputsof the arithmetic and/or logic units and having outputs coupled toinputs of the arithmetic and/or logic units by feedback paths.
 4. Thesignal processor as claimed in claim 1, wherein the processing meanscomprises at least two multipliers in different ones of the Plurality ofparallel data processing branches which can be coupled via themultiplexing means to two of the at least four input registers each. 5.A mobile radio terminal comprising the signal processor as claimed inclaim 1 for digital signal processing.
 6. A mobile radio base stationcomprising the signal processor as claimed in claim 1 for digital signalprocessing.
 7. A radio apparatus for digital radio comprising the signalprocessor as claimed in claim 1 for digital signal processing.
 8. AnISDN telephone comprising the signal processor as claimed in claim 1 fordigital signal processing.
 9. A DECT system comprising the signalprocessor as claimed in claim 1 for digital signal processing.
 10. Amethod for parallel digital signal processing, comprising transmittingdata in parallel from at least one data source to at least four inputregisters by a first data bus and a second data bus, wherein a first setof the at least four input registers is coupled only to the first databus and a second set of the at least four input registers is coupledboth to the first second data bus and to the second data bus, andselectively transmitting data buffered in the at least four inputregisters in parallel via multiplexing means to respective parallel dataprocessing branches of arithmetic and/or logic-operation processingmeans in dependence on control signals to the multiplexing means. 11.The method as claimed in claim 10, wherein data buffered in one of theat least four input registers is used for forming products by aplurality of multipliers in successive instruction cycles.
 12. Themethod as claimed in claim 11, wherein at least one of the plurality ofmultipliers is used for squaring data of one of said at least four inputregisters.